Performance Benefits of Special-Purpose Instructions in the CSI Architecture

نویسندگان

  • Dmitry Cheresiz
  • Ben Juurlink
  • Stamatis Vassiliadis
چکیده

The Complex Streamed Instruction Set (CSI) architecture was proposed in order to overcome the limitations of existing multimedia-oriented ISA extensions, such as Intel’s MMX and SSE. One of the main limitations is the large amount of instructions which has to be executed. In CSI, instructions operate on data streams of arbitrarylength, which allows to dramatically reduce the instruction counts for the kernels with sufficient amount of data-level parallelism. Previously, we have shown that CSI provides impressive performance improvements for several important multimedia kernels and applications. In these experiments the kernels were coded using elementary arithmetic CSI instructions such as addition, multiplication, etc. Many kernels, however, perform more complex operations and, thus, need to be translated to multiple elementary CSI instructions. For some kernels, CSI provides special-purpose instructions that collapse several elementary operations in a single complex one and thereby achieve an additional reduction of the number of executed instructions. In this paper we study the performance provided by an example of such an instruction. Using the SimpleScalar simulator, we evaluate performance of the superscalar CPU augmented with the CSI execution unit on the Paeth prediction kernel of the image encoder/decoder for the PNG image compression standard. Simulation results show that the kernel-level performance of the 4-way superscalar CPU augmented with the CSI execution unit improves by the factor of 12.9x when this kernel is implemented using the special-purpose CSI instruction instead of the elementary ones. Keywords— Processor Architecture, Multimedia ISA Extensions, Data-level parallelism

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance of the Complex Streamed Instruction Set on Image Processing Kernels

The Complex Streamed Instruction (CSI) set is an architectural paradigm designed to accelerate multimedia applications. These applications are characterized by streaming operations on small-width data elements such as 8-bit pixels or 16-bit audio samples. CSI instructions operate on two-dimensional data streams in a SIMD fashion and are able to process streams of arbitrary length. In this paper...

متن کامل

Performance Scalability of Multimedia Instruction Set Extensions

Current media ISA extensions such as Sun’s VIS consist of SIMD-like instructions that operate on short vector registers. In order to exploit more parallelism in a superscalar processor provided with such instructions, the issue width has to be increased. In the Complex Streamed Instruction (CSI) set exploiting more parallelism does not involve issuing more instructions. In this paper we study h...

متن کامل

Implementation of a Streaming Execution Unit

The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MM...

متن کامل

SV: Enhancing SIMD Architectures via Combined SIMD-Vector Approach

SIMD architectures are ubiquitous in general purpose and embedded processors to achieve future multimedia performance goals. However, limited to on chip resources and off-chip memory bandwidth, current SIMD extension only works on short sets of SIMD elements. This leads to large parallelization overhead for small loops in multimedia applications such as loop handling and address generation. Thi...

متن کامل

Motion Video Instruction Extensions for Alpha

The Alpha architecture[1] has added several new instructions called Motion Video Instructions (MVI) that can be used to accelerate the performance of key algorithms used in emerging motion video technologies. The criteria for selecting these new instructions are based on the fundamental premise that it is important to keep the Alpha architecture “clean” in order to facilitate extremely fast cir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002